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Abstract —In this paper, we address a key issue of de¬ 
signing architectures and algorithms which generate op¬ 
timal demand response in a decentralized manner for a 
smart-grid consisting of several stochastic renewables and 
dynamic loads. By optimal demand response, we refer to the 
demand response which maximizes the utility of the agents 
connected to the smart-grid. By decentralized we refer to 
the desirable case where neither the independent system 
operator (ISO) needs to know the d5mamics/utilities of the 
agents, nor do the agents need to have a knowledge of the 
d5mamics/utilities of other agents connected to the grid. The 
communication between the ISO and agents is restricted 
to the ISO announcing a pricing policy and the agents 
responding with their energy generation/consumption bids 
in response to the pricing policy. 

We provide a complete solution for both the deterministic 
and stochastic cases. It features a price iteration scheme 
that results in optimality of social welfare. We also provide 
an optimal solution for the case where there is a common 
randomness affecting and observed by all agents. This 
solution can be computationally complex, and we pose 
approximations. For the more general partially observed 
randomness case, we exhibit a relaxation that significantly 
reduces complexity. We also provide an approximation strat¬ 
egy that leads to a model predictive control (MPC) approach. 
Simulation results comparing the resulting optimal demand 
response with the existing architectures employed by the 
ISO illustrate the benefit in social welfare utility realized by 
our scheme. To the best of the authors’ knowledge, this is 
the first work of its kind to explicitly mark out the optimal 
response of d5mamic demand. 

I. INTRODUCTION 

Traditionally, given demand (or a demand forecast), 
generation (or the planned generation) has been dis¬ 
patched so as to balance demand in power systems. Since 
there are many generators capable of producing power at 
different cost curves, it is desirable to allocate the total 
power generation among the generators so that the total 
cost of generating the required power is minimized. This 
role of determining which generators are selected, and 
how much power they produce, has been played by the 
Independent System Operator (ISO). 

This economic dispatch of generators is done by so¬ 
liciting production bids (power vs. price curves) from 
each generator, and then, given the demand, choosing 
the generators such that the overall cost to purchase 
the power is minimized. Motivated by the problem of 
integrating renewable power generation from sources 


such as wind and solar m, we consider the problem of 
demand response, i.e., adjusting demand so that it can be 
part of the flexibility to match the intermittent generation 
||2]| . The process of increasing or decreasing demand can 
be accomplished by decreasing or increasing the price 
of power. Thus the problem becomes one of choosing 
the level of demand as well as allocating the required 
generation among the various producers of power from 
fossil fuel sources such as coal or gas. 

An additional consideration is that both power genera¬ 
tors as well as loads are dynamic systems. Generators, for 
example, have ramping constraints as well as maximum 
power constraints. Similarly, analysis of load data shows 
that loads also have certain dynamic constraints [131] . 
Thus, all variables, including price as well as generator 
power outputs as well as loads, are functions of time. 

A third consideration is that renewable power pro¬ 
duction is uncertain, which we model as a stochastic 
process. Thus, in addressing the djmamic evolution of the 
future demand as well as future non-renewable power 
generation over time, one needs to take future uncertainty 
of renewable power generation into account. 

In this paper, we consider the resulting overall problem 
faced by the ISO: Given stochastic renewable generation 
which is disclosed causally in time, how should an ISO 
choose the price causally as a function of time, and 
thereby the level of demand response elicited, and then 
allocate the net remaining generation among various con¬ 
ventional generators. A further consideration which we 
will neglect in this paper is that the network constraints 
are satisfied. However, this can be incorporated into the 
setup that we develop in this paper. 

There are some complexities that are involved in the 
problem, and some constraints that a desirable solution 
has to satisfy. It is desirable that the communication from 
the ISO to the generators or loads contain only the price of 
power in each market interval. Also it would be desirable 
that the communication from each generator and load to 
the ISO, at each time instant only be the resulting power- 
price point for desired production and demand response 
respectively. In particular, we would not like to require 
that each load provide a dynamic model of its behavior 
as well as its utility function to the ISO, and similarly for 
each generator. Not only is this a lot of information that 


the agents may not want disclose for reasons of privacy or 
business, but it also implies that the ISO solve a gigantic 
optimization problem encompassing every single agent in 
the system with all agents’ utility functions. 

Thus we would like the system-wide optimization be 
conducted by the agents themselves in a distributed man¬ 
ner, coupled only by the price announced by the ISO. 
Additional complexities are introduced by the fact that the 
agents are dynamic systems as well as the fact that the 
renewable power generation is stochastic. Our approach 
attempts to provide a comprehensive solution that takes 
into consideration all individual agents’ utility functions. 
It has one feature that would be desirable to eliminate. 
At each time our approach requires that the ISO interact 
with the agents in an extended transaction. The ISO 
announces a tentative price, and the agents respond with 
power generation and consumption levels. Based on these 
responses, the ISO again announces a tentative price, and 
so on. This process continues till the price converges. We 
see no way to eliminate this bidding process, and wonder 
if it is well nigh impossible in general since the agents 
need to somehow convey to the other agents how they 
respond to prices. At any rate, this is a significant open 
problem. 

We examine a sequence of models, a determinis¬ 
tic model, a common completely observed randomness 
model, and a partially observed randomness model. For 
the deterministic model our solution is complete and 
leads to social welfare optimality. Next we consider the 
case where there is a common randomness affecting all 
agents and observed by each of them. This of course 
includes the case where only certain portions of the 
overall randomness affect individual agents and in differ¬ 
ent ways. In this case we propose a complete solution, 
in principle, which leads to social welfare optimality 
of the utility. In this case, we also propose a scheme 
to reduce complexity that leads to a model predictive 
approach. In the most general case where the different 
agents experience differing randomness not observable 
by other agents, we propose a relaxation that leads to 
a significant reduction in complexity. We also show that 
our MFC approach extends to this case. We also represent 
simulations comparing our algorithms with the approach 
used presently by the ISO and illustrate the improvement 
in utility that can be realized. 

Overall, our approach provides a theory for the oper¬ 
ation of the ISO in an environment where it is needed 
to integrate stochastic renewables, demand response, 
and djmamic and other constraints as well as uncer¬ 
tainties in both generation and loads. The solution re¬ 
quires communication only of prices and energy produc¬ 
tion/consumption responses, as is desirable. Specifically 
it is not necessary for the ISO to be aware of the dynam¬ 
ics of producers or consumers or their individual utility 
functions. Similarly, the various agents need only know 
their own dynamic system models and states and utility 


function, and need not be aware of any attributes of other 
users, or even the existence of other users. We hope our 
approach leads to a firm foundation for the operation of 
the next generation ISO. 

One approach to procure Demand Response is direct 
load control, where based on an agreement between 
customers and the aggregator or utility, remote control of 
certain devices (e.g., air conditioners, pumps etc.) is used 
to manage their energy consumption [141] . The lack of user 
privacy is a major barrier to large scale implementation. 
The alternative is an incentive based approach such as 
real-time pricing where users voluntarily manage their 
consumption in response to a time-varying retail tariff 
a. The challenge with this approach is the difficulty 
faced by customers to manually respond to time-varying 
prices. We envision a scenario where an intelligent agent 
automatically manages the energy consumption schedul¬ 
ing of the customer based on certain cost and comfort 
settings selected by the user. The key issue in this case is 
of designing an architecture which yields optimal demand 
response in a decentralized manner while maintaining 
user privacy. From the system operator point of view 
the challenge is to optimally balance the system when 
uncertainty arises both in supply and demand, without 
resorting to the brute force and expensive option of 
procuring large amounts of reserve and/or energy storage 

la. 

We realize that the current smart grid suffers from the 
key issues of renewable penetration and electricity price 
fluctuations. Our approach of modeling the users by a dy¬ 
namical system in lieu of following a ’’static-optimization” 
is the key to generate demand response that plays the 
dual role of mitigating renewable penetration and price 
fluctuation via utility optimization. Thus our goal is to 
provide a framework in which Demand Response could 
participate in both energy and ancillary service markets 
0, HI- Of course, this demand response needs to be 
optimized in order to achieve our pre-set goals. This 
is done via utilization of the computational power and 
latent energy storage that is present in the smart users 
connected to the smart grid. 

The paper is organized as follows: we begin with a 
survey of some related works in Section II and give a 
complete description of the problem in Section III. This is 
followed by a discussion on iterative bidding schemes and 
the ensuing optimal demand response in a deterministic 
setting (Section IV) and stochastic setting (Section V). 
Finally simulation results are shown (Section VI), which 
support our theory. 

II. LITERATURE SURVEY 

There have been many efforts since the deregulation 
of the electricity sector on a market-based framework to 
clear the system. Ilic et al. |[9]| proposed a two-layered ap¬ 
proach that internalizes individual constraints of market 
participants while allowing the ISO to manage the spatial 


complexity. The approximated MFC algorithm is shown to 
work in many realistic cases. 

In order to analyze the strategic interactions between 
the ISO and market participants, game theoretical ap¬ 
proaches have been proposed in a number of paper. 
Zhu et al. UlOll use a Stackelberg game framework for 
economic dispatch with demand response. The approach 
uses a two person game with ISO as leader and users 
aggregated into second player. The users change their 
demand based on price signal so as to maximize their 
payoff function. The ED problem considered is a single 
time interval conventional dispatch without transmission 
line constraints. Bu and Yu UllH models the interactions 
between electricity retailers and customers as a Stackel¬ 
berg game. This work considers the case of a monopoly 
retailer where information about customers’ utility and 
consumption pattern is available. Jia and Tong Ill2ll uses 
Stackelberg to study the energy consumption scheduling 
problem for customers who are subjected to a time- 
varying price which is determined one day ahead of 
time. The trade-off between consumer surplus and retailer 
profit under different pricing schemes is investigated. 

Song et al. II13II applies a Markov decision process 
(MDP) model to the bidding problem for generators 
participating in electricity market. Gajjar et al. II14I1 ex¬ 
tends this approach and uses actor-critic learning. Gao 
et al. II15II presents a method for obtaining the bidding 
strategy of market participants using parametric linear 
programming. However, it assumes that market partic¬ 
ipants have complete information on system conditions 
and competitor strategies. 

Wang et al. II16II formulates the trading of energy by 
storage units as a noncooperative game. Under certain 
assumptions for the strategy space and utility functions 
a Nash equilibrium is shown to exist. An iterative al¬ 
gorithm is used to reach equilibrium following which a 
double auction is conducted. Mohsenian-Rad et al. II17II 
proposes a distributed algorithm to obtain the optimal 
energy consumption schedule for each user. The problem 
of determining the user energy consumption schedule for 
the whole day is formulated as a deterministic linear 
program. Two problems are considered with two different 
objectives: (i) minimize energy cost, and (ii) minimize the 
peak to average ratio of demand. 

One of the major challenges in the above approaches is 
how to elicit optimal demand response without revealing 
the inherent dynamic nature of the loads to the ISO. In 
this paper, we model the users as stochastic dynamical 
systems and generate the optimal demand response in a 
decentralized and adaptive manner, thus maximizing the 
sum total of the utilities of the users, which in-turn allows 
for maximum renewable penetration and in controlling 
price fluctuations. 


III. Notation 

Throughout the paper ui,Wi etc. will be used to denote 
random variables. Also, random variables will be in cap¬ 
itals, while their realizations in small letters, eg. random 
variable X, and event {X = x}, etc. 

IV. Problem Formulation 

We consider a smart-grid in which there are a total 
of N agents. Each agent may be either a consumer or 
a producer of electricity. We model time as consisting 
of discrete periods. At each discrete time t each agent 
i obtains or supplies Uiit) units of energy (equivalently 
power since it is proportional to it given the fixed period) 
to the grid, with Ui{t) > 0 signifying that user i supplies 
energy to the grid at time t, while Ui{t) < 0 signifying 
an energy consumption from the grid by the i-th user. We 
will suppose that there is net energy balance at each time 
over the whole grid: ^*(0 = 0 for all t. This model 

does allow for storage too, since each storage device can 
be considered as an agent. 

We model each agent as a dynamic system. The mo¬ 
tivation in the case of an agent which is a generator 
is that it has ramp up constraints, thus necessitating a 
dynamic system model, or in the case of a load it may 
have similar ramp down constraints as well as delay in 
demand response. The state of the user i at time denoted 
by Xi(t) e Xi evolves as, 

Xi{t + 1) = fl{xi{t),Ui{t)),t = 1,2,... ,r - 1. (1) 

Thus the state of the grid resides on the space ®Xi. 
We suppose that each agent i has a utility function 
Fi[-) : Xi K, with the understanding that the user 
prefers a state having higher utility. The total utility of 
user i over the horizon {1,2,...,T} is Fi{xi{t)). 

(The theory can be generalized in a straightforward way 
to utilities that are time-dependent.) The model ([l]) can 
incorporate constraints on inputs, for example reflecting 
bounds on ramp rates, such as Ui{t) € Ui. In that case, 
these constraint sets Hi are not dualized, but simply carry 
over to the dual in @. For simplicity of exposition we will 
not explicitly consider this case in the treatment here, but 
will consider such constraints in the numerical examples 
in Section IWl 

With the above set-up, we are led to the follow¬ 
ing deterministic social welfare optimization problem 
(DSWOP): 

N T 

max EE Fi{x^{t)) 

subject to ^ = 0, Vf = 1,2,..., T 

i 

Xi{t + l) = fi{x^{t),u^{t)), for 
f = 1, 2,..., T - 1, i = 1, 2,..., A^. (DSWOP) 

Subsequently we will consider the stochastic version of 
the problem caused by uncertainties due to weather, etc. 


V. Optimal Demand Response and Decentralized 
Solution Via Bidding 

The above problem can be interpreted as giving the 
ISO the task of determining the T-dimensional vectors 
Ui ■- {ui{l), u^{2),Ui{T)), for i = 1,2,..., Af, so as to 
maximize the social welfare Pi{xi{t)). In this 

section, we will derive an easy-to-implement algorithm 
that does so, while satisfying information and action 
decentralization, with all communication between agents 
restricted simply to being either price announcements or 
purchasing or supply of energy decisions in response to 
prices. The ISO simply determines the appropriate prices 
causally, while each user optimizes its response causally. 

The Lagrangian for the problem IDSWOPI is, 

. . . ,M7V, A) 

NT T / ^ \ 

(^w*(f)) , (2) 

i=l t=l t=l \i=l / 

where = 1,2,...,T are the Lagrangian multipli¬ 
ers associated with the constraints = 0,t = 

1,2,..., T respectively. The Lagrange dual function is, 


D{X) = max £(mi, M 2 ,..., Mat, A) 

Ui,U2,...,UN 

N / T \ 

= max ) 

Ui,U2....,UN \ I 

2=1 \t=l / 


(3) 


The objective function @ can be decomposed agent 
by agent since they are only coupled by price. Hence 
we consider the optimal problem for agent i as one of 
maximizing the objective 


T 

ina.xy^ Fi{xi{t)) - (4) 

Ui 

for the dynamic system ([T]). The optimal cost is a func¬ 
tion of the initial condition and the Lagrange multi¬ 
plier sequence A = (A(l),..., A(T))., and we denote it 
Vi(xi(0),A). Therefore, 


N 

L?(A)=^V,(x,(0),A). 

i=l 

We thus observe that the consideration of the dual prob¬ 
lem has led us to a decentralized problem. Its solution 
involves the ISO first announcing the price vector A, and 
then each agent i simply optimizing its own objective Q 
by determining its vector m^. Thus neither the ISO, nor the 
other agents need to know the utility function of agent i. 
The dual problem is to. 


miniA(A) subject to 
A(1),...,A(T) >0. (5) 

We will suppose that strong duality holds, i.e., the op¬ 
timal values of IDSWOPI and ([5]) are equal. There are 


several sufficient conditions for strong duality. For ex- 
an^le a sufficient condition is for the utility functions 
^(^1 to be convex and the feasibility region 

of the problem IDSWOPI to be non-empty. Denoting the 
optimal solution of the Dual problem by A*, we will 
suppose that, 

N 

L?(A*)=^14(x,(0),A*) 

2^1 

N T 

»=i *=i 

Vt=l,2,...,T 

The issue faced by the ISO is how to determine the 
optimal price vector A*. Since D{X) as well as Vi(a;i(0), A) 
are all concave functions of A, will consider the use of the 
sub-gradient method for iterating on the price-vector A so 
as to converge to the optimal price-vector A*. Denoting 
the sub-gradient by we note that, 

^ - V — 

'dX~ 

i—1 

N N N 

y2uyi),y2uy2),...,y2uHT) 

i—1 i—1 i—1 

where (m^(1), m^( 2), ... is the vector that 

achieves the optimal utility for the i-th user for the price 
vector A in ([4l). 

We see that the iterations on the price vector A generate 
the corresponding demand response (According to the 
Federal Energy Regulatory Commission, demand response 
(DR) is defined as: “Changes in electric usage by end- 
use customers from their normal consumption patterns in 
response to changes in the price of electricity over time, or 
to incentive payments designed to induce lower electricity 
use at times of high wholesale market prices or when 
system reliability is jeopardized.”) Ill8ll - i2li . 

DR{X) = ^, (6) 

where contains the vectors for agents i which are 
consumers. DR is a useful quantity because the social 
welfare of the grid depends upon it. 

Based on the sub-gradient algorithm, we obtain the 
following price iteration algorithm. Set k, the iteration 
index to 0. The ISO declares a price vector A° (which is 
chosen arbitrarily, but of course preferably close to the 
true price vector). 

> The users i for i = 1,2,...,N solve their individual 
optimal control problems and calculate the 
Then they separately submit their bids 

• The ISO then updates the price vector as: A^'*'^ = 
X^ — a Wi)) where a > 0 is a step size. Increment 
k by one and go to step i. 











There are several choices for the step size a, and 
several convergence results for the resulting sub-gradient 
method 022ll . 

VI. Bidding with Stochastic Renewables and 
Demands 

In the previous section, the dynamics of the users 
were assumed to be deterministic, i.e., the exact value 
of the system state at the next instant was completely 
determined by ([l]). This might be unrealistic keeping in 
mind the stochastic nature of renewable energy as well as 
user demands, etc. We begin our discussion with a special 
case in which the theory can be fully developed. In this 
case, which we call the Common Completely Observed 
Case, the sources of stochastic uncertainty are known to 
all the agents and observed causally by all of them. This 
could include for example the cloud cover in Denver or 
wind speed and direction in Brazos County in Texas. 

A. Common Observed Randomness Case 

in which Let uj = a;(l), w(2),... ,w(T) be T primitive 
random variables. They can be independent and identi¬ 
cally distributed or Markov. For simplicity, let suppose that 
each ujit) assumes value in a finite set. The state of the 
i-th agent evolves as, 

Xi{t -F 1) = + 1)), 

and it is assumed that each agent observes w causally in 
time, i.e. has access to uj{1),uj{ 2), ... ,w{t) at time t. Also 
everybody knows the probability law P oi uj. That is why 
we call this the Common Observed Randomness Case. The 
primitive random variables could model the wind speed in 
Texas or sunlight in Denver, which everybody has access 
to causally, and for which they have a d 5 mamic model. 

The problem of interest is then to, maximize the utility 
function 

{ NT 'j 

i=l t=l J 

(Common Known Randomness Problem) 

for the N stochastic dynamic systems 

X,(t + l) = fUX,(t),u;(t)) (7) 

Each l/i(t) is required to be -measurable, where Tt = 
cr(a;(l),a;(2),... ,u}{t)) is the sigma-algebra generated by 
the random variables w(l),a;(2),... ,a;(t). The inputs 
Ui{t), for i = 1,2,..., N at each t have to satisfy the 
constraint 

Ui(t) = 0, for each t = 1,2,... ,T. (8) 

i 

We consider the following ISO based approach to solving 
this problem. Let w* = (a;(l), w(2),..., w{t)) be the past of 
w (= until time t. The ISO announces a price random 
variable A(a;) = (A(l, A(2, ..., \{T, for each 

UJ. 


Note that the price announcement by the ISO is ac¬ 
tually a policy announcement. (Much like the Federal 
Reserve saying that interest rates will rise if there is a 
hurricane). The ISO is saying that if the disturbances 
a;(l), a;(2),..., uj{t) hit the system by time t, then the price 
will be X{t + 1, UJ*). 

Based on this policy announcement, the individual 
agents also respond with a policy. Agent i announces a 
policy Ui{l,uj^),Ui{2,uj'^),... ,Ui{T,uj'^). The agents de¬ 
termine their policies individually simply by dynamic 
programming since each knows the probability law for 
the stochastic process uj, and their own d 5 mamic system 
model. 

Now we can see that this system is amenable to 
the same iteration for prices \{w) as before, with the 
only modification that the iteration process is repeated 
at each time t to determine future policies. To elabo¬ 
rate, at each time t, the ISO first announces the fu¬ 
ture price policy as its first iterate, A°(<,w*), A°(f -I- 
1, ..., A°(T, w^). Each agent i then responds with 

future consumption/generation policy Ui(t,uj*),Ui{t -\- 
l,uj*'^^),..., Ui{T,uj'^). The ISO computes whether there 
is a net surplus or deficit of energy at each future time , 
sum^iUi{t), sum^iUi{t + 1),..., sum^iUi{T). based on 
this it iterates to produce a new iterate X^{t,uj*),X^{t -f 
l,uj*~^^,... ,lambda^{T,uj^). This iteration can be based 
on a sub-gradient method where the increment is pro¬ 
portional to the energy surplus/deficit vector. Then 
the users again respond with the next iterate of the 
future consumption/generation policy Ui(t,uj*),Ui{t -\- 
1,0;*+^),..., Ui{T,uj'^). This continues until there is con¬ 
vergence. 

This is a solution of the prob¬ 
lem [Common Known Randomness ProblemI which 
leads to optimal utility. The only issue is complexity, 
since w lies in a huge cardinality set |ff|. Thus, we next 
propose an approximation algorithm. 

Approximation Algorithm with fc-step Look-ahead 
At each time 0 < s < T, the ISO announces the 
prices A(s -f 1),..., A(s -F k) for the next k time periods, 
freezing the prices after k periods. Iteration then takes 
place over the fc-dimensional space, and at each step 
the iteration tries to achieve energy balance over the 
next k time periods via bids. The idea is similar to the 
Model Predictive Control (MPC), so that optimization is 
performed only for k horizon look-ahead instead of entire 
T horizon, thus giving us a reduction in complexity. This 
policy will not approach the optimal policy even as fc —oo 
since it is what is called an “open loop feedback policy”. 
At each time the future price sequence is assumed to 
be deterministic, not a fully uncertainty state-dependent 
policy. 

B. The Partially Observed Randomness Case 

As opposed to the assumption in the previous section, 
we consider a more general case where each agent i has a 




separate “private” stochastic process uji = Wi(l),..., uJi{T) 
affecting only his system via the equation 

x,it + i) = f!ix,it),u,it),uj,it + i)), 

The stochastic process Wi is not completely observed by 
the other agents, and only agent i knows the law of pro¬ 
cess Wi- The objective function, and the constraints remain 
the same as in ICommon Known Randomness Problem! 
However the assumption that an agent does not have 
access to the randomness of other agents makes it difficult 
to achieve the co-operation amongst the agents. 

If the goal is to optimize the utility over all decen¬ 
tralized policies, then the ISO has to know much more 
about each individual agent’s system. It needs to play a 
more active role so as to induce co-operation amongst 
the agents. More concretely, the ISO needs to much more 
about each individual agent and its dynamic system. It 
needs to know their value of the states Xi{t), utility 
functions Fi{-) and their dynamics fl{Xi{t),Ui{t),Wi{t)) 
and the probability distributions of their uncertainties uji 
of each agent i. Under this assumption, the ISO can decide 
the optimal allocations U{t) for each t, as a function 
of the state of the system via dynamic programming. 
This procedure suffers from the curse of dimensionality 
as the number of users is increased, since it amounts to 
nothing less than an optimal solution of the decentralized 
problem. 

An optimal decentralized solution, where the solution 
is itself computed in an iterative decentralized manner to 
this is an interesting and open problem to pursue. What 
we have done above in the Common Completely Observed 
Randomness case, is demarcated a problem for which the 
solution is precisely known in principle. 

We now present another approach, a relaxation, that 
also provides very interesting approximation algorithms 
with much reduced complexity. 

Free Storage Relaxation for The Relaxed Partially 
Observed Randomness Problem 

Let us assume that the ISO has access to a subset 
of the randomness which is denoted by , 

knows the law of and assume to be a positive 

recurrent Markov process. This is the same as assuming 
that the ISO of a city has knowledge to the weather of 
the city, or has knowledge of events which might alter 
the electricity consumption in a big way, and knows the 
probability laws governing them. However ISO doesn’t 
have knowledge of the utility functions Fi of the agents, 
their dynamics fi, nor the entire randomnesses oji. Also it 
is assumed that ^iso 

and its law are known to each agent. 
The key idea for producing a tractable approximation is 
to relax the constraint of energy supply equal to energy 
consumption at each time t, and along each sample 
path of the stochastic uncertainty, i.e., almost surely. We 
replace this almost sure equality constraint at each time 


f by a conditional expectation of the net power being in 
balance: 

limsup;^E = w) ) =0, 

where w is any element of the state space of 
Intuitively it means that the power-balance constraint 
Ui{t) is allowed to be violated, however the fluc¬ 
tuation should balance out over time, conditioned on the 
ISO’s observations. 

It can be shown that the optimal policy for this case is 
for the ISO to declare the price at time t based on the 
value of with users then choosing the quantities 

Ui{t) based on the value of their state Xi{t), and the 
process That is, an agent i need not know the 

value of the state Xj{t),j i, nor their utility functions 
to decide the quantity Ui{t). The analysis and the proof of 
optimality in the case of large number of agents is anal¬ 
ogous to the treatment of multi-armed bandits problems 
and activity allocation problems [I23I1 - I126II and uses the 
technique of large-deviations for Markov process. We note 
that the relaxation provides a precise upper bound on 
utility, which is the utility that can be realized in case 
there is free storage. It decouples temporal constraints on 
energy balance. 

The Limited Lookahead Approach for the Partially 
Observed Randomness Case The Limited Lookahead can 
also be applied to the Partially Observed case to yield 
an approach that resembles the Model Predictive Control 
approach. We would like to mention in passing that 
the MPC approach discussed in the Section IVI-AI can be 
applied in order to develop an approximation algorithm 
for The Partially Known Randomness Case. 

At each time t, the ISO fixes a A:-step random price 
vector A° for the next k time instants. This vector will 
depend only upon the in a causal way. The agents 

respond to this vector via calculating optimal bids Ui for 
the future time periods. Then the ISO iterates the price 
upon receiving the agents’ bids. The iterations continue 
till the changes in the iterates become small enough. 

VII. Numerical results 

We illustrate the above algorithms by a simple example. 
We start with a deterministic case, followed by a stochas¬ 
tic case. Throughout this section, we assume that the N 
users are divided into two groups: user i G {1,..., M} act 
as residential consumers and users i G acts 

as power suppliers. 

A. Deterministic case 

We first define the state equations. For consumers, Xi{t) 
denotes the room temperature for i-th user at time t and 
Xi{t) evolves as, 

Xi(t + 1) = UiXiit) hi — PiUi{t), i G {1,..., iff} 







Fig. 1. Convergence of the price vector for deterministic case 

where a^s and /3i’s are constant and hi denotes ambient 
heating. For suppliers, Xi{t) denotes the power produc¬ 
tion level for the i-th user at time t, and it evolves as, 

+ 1) = a,iXi{t) -F Ui{t), i € {M + 1,..., TV} 

with the Qi’s being constants. 

There are natural constraints associated with the state 
equations. For consumers, Ui{t) < ^{hi + Ci), where a is 
the maximal cooling rate. For suppliers, Ui{t) < rt, where 
n is the maximal ramp rate allowed. 

We now define the utility functions Fi. For consumers, 
let 

= - (^X,it) - +m,- A(t)17.(t) 

where [(j)ii, 4 > 2 i] is i-th user’s “comfort bounds” for temper¬ 
ature, mi’s are constant and A(f) is the price. For suppliers, 

= X{t)X,it)- {CuXfit) + C2^X,{t) + C3^ 
FCMt))- 

where Cu, € 2 %, C^i and Ca are cost coefficient for i-th 
user. 

In this case, the state equations and constraints are 
linear, and the objective functions are quadratic, thus we 
use QCQP (Quadratic Constrained Quadratic Program¬ 
ming) to solve the problem. For simplicity, we let M = 5, 
N = 10, hi — Pi = 1, rrii = 2, ai = 1 and choose pu 
uniformly from [20,21], (j) 2 i from [24,25], from [0.5,1.5], 
Cii from [0.9,1.1], C 2 i from [0.1,0.3], Czi from [0.5,1.0] 
and Cii from [0.1,0.5]. 

Fig- m plots the evolution of the price vector and for 
legibility of display, we only plot 4 components of A. It 
is easy to see that A converges quite fast, in than 20 
iterations. 

Fig. [ 2 ] shows the demand response value as a function 
of iterations. For the deterministic case, this is simply the 
norm of matrix U, where 17 is a M x T matrix with 
Uij = Ui{j). Here we use Frobenius norm defined by: 

[[[/[If = ijj2i '12] ^ij- Here we can see that demand 
response approaches a constant as iteration goes on. 



Fig. 2. Demand Response for deterministic case 


B. Stochastic case 

We adopt the same notation as in the deterministic 
case, but modify the state equations by adding a ran¬ 
dom variable influencing the availability of renewable or 
stochasticity of demand. For consumers, 

Xi(t -F 1) = aiXiit) hi — PiUiit) -F W(t), i€{l,..., AT} 

where W(t) is not necessarily i.i.d. because of geograph¬ 
ical and temporal correlation. Similarly, for suppliers, 

Xi{t -F 1) = aiXi[t) Ui{t) V{t), i G {M -F 1,..., iV} 

where V (t) is not necessarily i.i.d. either. 

In our simulation, for simplicity, we let W{t) assume 
two values drawn uniformly from [—0.5,0.5], each with 
probability 0.5; and let V(t) also take two values drawn 
uniformly from [—0.2,0.2], each with probability 0.5. 

For each step in the model predictive control approach, 
the price vector A converges within 20 iterations, just as 
it does in the deterministic case. Fig. [3] plots the demand 
response value as a function of the iterations. Let Q{t) be 
the vector containing only the fy’s for i G {!,..., M}. For 
display purpose, we only plot the first 4 steps and adopt 
the norm. As the optimization window moves, jjQ]] 
converges faster; whether it converges from the above or 
below depends on the initial value. 

C. Comparison with current ISO 

The current ISQ sets the price as follows: At time t, 
it assumes that the demand D{t) is given, and based on 
the previous step’s production level Xiit — 1), and the 
marginal cost of each producer, it assigns the production 
level Xi(t) for each producer so as to minimize the 
total production cost at time t. The associated Lagrange 
multiplier will then be the price at time t. We set = 3 
for * G {1,..., Af}, and keep the same value for the other 
parameters. Each consumer tries to keep Xi{t) = 
for all t, and the resulting Uiit) will be used to calcu¬ 
late the demand input Dit) to the ISQ scheme, where 
■0(f) := Ez U^{t) for i G {1,..., M}. 
















Fig. 3. Demand Response for stochastic case 



Fig. 4. Variance of price 


Fig. |4] summarizes the results. By fixing the uncertainty 
magnitude ||Fh|| = 0, the figure on the left shows that 
which is generated by the ISO scheme, has a bigger 
variance crjgQ than A, which is obtained by our iterative 
approach. Moreover the difference in variance becomes 
even larger as increases. The figure on the right fixes 
I Oil = 1 and plots changes in the variance of price as a 
function of ||Fh||. Similarly in the left figure, the difference 
in variance increases as ||Fh|| increases. 

Next we compare the total utility of the entire system 
obtained by the two approaches. Notice that in our case 
the total utility is. 


T N 

t=i i=i 



d'li + 4^21 
2 


2 

+ rui 


— E! CuXf{t) + C2iXi{t) + Csi + C4iUi{t) 


as the A terms cancel out. We calculate the total utility 
incurred by the two schemes; the results are shown in 
Table [H It can be seen that the total utility obtained by 
our dynamic iterative approach is roughly three times the 


TABLE I 

Total utility obtained by ISO scheme and the iterative approach 


Name 

The iterative approach 

ISO scheme 

Value 

427.1932 

142.8451 


TABLE II 

Total expected utility obtained by ISO scheme and the iterative 

APPROACH 


Name 

The MPC approach 

ISO scheme 

Value 

500.2578 

159.2198 



Fig. 5. Changes in total utility as \ai\ or ||W|| varies 


total utility obtained by the greedy ISO scheme. (Note 
that this coincides with the setting of at = 3.) 

The above considers the case where demand is deter¬ 
ministic. When there is randomness from the demand 
side, the ISO scheme aims at minimizing the expected 
production cost at each time t. We assume for simplicity 
that the added noise term W (t) is i.i.d. and takes values 
0.5 and —0.5 each with probability 0.5. Similarly, as in 
the deterministic case, we calculate the total expected 
utility incurred by the two schemes and present the 
results in Table [III h can be seen that the total expected 
utility obtained by the MFC approach is more than three 
times the expected utility obtained by the ISO scheme. 
We conclude that in both deterministic and stochastic 
cases, our approach provides greater total utility than the 
current ISO scheme. 

We also compare the change in total utility obtained as 
a function of at for the two schemes. We fix |ai| = 1 
and let W(t) be i.i.d. taking values ||W|| and —||FF|| 
with probabilities 0.5 and 0.5 respectively. We observe 
the change in total utility while increasing the noise 
magnitude ||Fh||. The result is shown in the left plot in 
Fig. |5j It can be seen that as ||Fh|| increases, the total 
utility obtained by the MFC is not a strict linear function 
of the utility obtained by the ISO scheme. The plot on the 
right fixes ||Fk|| =0, and shows that as |ai| increases, the 
difference in utility obtained increases as well. 

VIII. Concluding Remarks 

We have formulated the problem of allocating the 
power demands and generations over the heterogeneous 























































energy consuming or producing agents or prosumers, 
connected to a smart-grid in a dynamic fashion, both 
under a deterministic setting and a stochastic setting 
when there are underlying uncertainties affecting both 
generation as well as consumption. We have proposed 
decentralized iterative algorithms to solve this problem. 
These algorithms work under the assumption of local 
knowledge, i.e., an agent needs to keep track of only its 
own randomness and its own system djmamics. We have 
shown that the ISO can play a central role in inducing 
co-operation amongst the agents by declaring policies. A 
possibly significant theoretical contribution is that in the 
common completely observed randomness case, there is 
an ISO strategy that achieves social welfare optimality. 
It incorporates decentralized dynamics where there is no 
need for agents to be aware of each others’ d 5 mamics or 
states. The only communication from the ISO is price 
policy and from the agents their energy consumption 
or generation in response to the price. We have also 
proposed more computationally tractable policies for this 
case. For the case of Partially Observed Randomness case, 
we have further indicated a relaxation that significantly 
reduces complexity, as well as an MPC approach that is 
tractable. Some simulation results showing that the pro¬ 
posed algorithms appear to outperform the current ISO 
practices in terms of the net social welfare are provided. 
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